Degraded Document Image Binarization Using Optical Character Recognition
نویسندگان
چکیده
The proposed OCR algorithm to retrieve the text in the scanned document images. Here the text detection algorithm based on two machine learning classifiers: one allows generating candidate word regions and the other filters out non-text ones. The extract connected components (CCs) in images by using the maximally stable extremal region algorithm. In CC clustering adaboost classifiers are used to determine whether the region contains text or not. Then using binarization method, the gray image is converted into binary image. The binarization outcomes are subject to OCR and the corresponding result is evaluated with respect to character and word accuracy. As more and more text documents are scanned fast and accurate. Additional performance metrics of the percentage rates of broken and missed text, false alarms, background noise, character enlargement and merging. This effectiveness of the proposed method is also confirmed by tests carried on realistic document images. For proposed algorithm MATLAB version 13 software is used.
منابع مشابه
A Quad Tree Based Binarization Approach to Improve quality of Degraded Document Images
This paper proposes a novel binarization algorithm for converting the grayscale and color images into black and white images. The binarization is one of the very important process in all the researches pertaining to the field of the Document image processing and Pattern recognition. Since quality of binary image plays a critical role in the further processing of the document, especially in the ...
متن کاملBinarization of Document Image
Documents Image Binarization is performed in the preprocessing stage for document analysis and it aims to segment the foreground text from the document background. A fast and accurate document image binarization technique is important for the ensuing document image processing tasks such as optical character recognition (OCR). Though document image binarization has been studied for many years, t...
متن کاملA Proposed Binarization Technique on Hand written document
Abstract: Binarization is performed in the preprocessing stage for document inspection. Binarization of degraded document images improve the result from poor quality of the paper, the printing process, ink blot and fading document and remove noise from examine. In recent years, libraries have begun to digitize historical document that are of interest to a wide range of people, with the goal of ...
متن کاملOptical Character Recognition from Degraded Document Images
Segmentation of the text from badly degraded document images is very challenging tasks due to the high inter/intra variation between the document background and the foreground text of different types of document images. In this paper, a novel document image binarization technique is used to addresses the issues in the degraded document images by using adaptive image contrast. The adaptive image...
متن کاملDocument Image Binarization Using Retinex and Global Thresholding
Document images are usually degraded in the course of photocopying, faxing, printing, or scanning. Degradation problems seems negligible to human eyes but can be responsible for an abrupt decline in accuracy by the current generation of optical character recognition (OCR) systems. In this paper we present a binarization method based on retinex theory followed by a global threshold. The proposed...
متن کاملDegraded Document Image Binarization Techniques
Document Image Binarization is performed in the preprocessing stage for document analysis and it aims to segment the foreground text from the document background. A fast and accurate document image binarization technique is important for the ensuing document image processing tasks such as optical character recognition (OCR) and Document Image Retrieval (DIR). This research area has been studied...
متن کامل